Evolving complex networks with conserved clique distributions 
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We propose and study a hierarchical algorithm to generate graphs having a predetermined distri- 
bution of cliques, the fully connected subgraphs. The construction mechanism may be either random 
or incorporate preferential attachment. We evaluate the statistical properties of the graphs gener- 
ated, such as the degree distribution and network diameters, and compare them to some real-world 
graphs. 



I. INTRODUCTION 

The structural and statistical properties of networks 
have been studied intensively over the last decade [l|, 0] , 
due to their ubiquitous importance in technology, differ- 
ent realms of life and complex system theory in general 
Q . With time it was realized that the topological proper- 
ties of real- world networks often transcend the universal- 
ity class of both the straightforward, all-random Erdos- 
Renyi graph Q , as well as that of random networks with 
arbitrary degree distributions 

Many real- world networks have a well defined commu- 
nity structure Q. A community is, loosely speaking, a 
subgraph which has an intra-subgraph link density which 
is substantial above the average link-density of the whole 
network. The community with link density equal to one 
is denoted in graph theory as a 'clique'. A clique is a 
fully interconnected subgraph, the smallest clique having 
just two vertices. 

A clique is also a specific realization of a graph motif, 
i.e. of subgraphs with definite topologies [3, [f|, and of 
fc-cores, viz subgraphs with at least k interconnections 
Q . In a related work Derenyi et al. have introduced the 
notion of clique percolation in the context of overlapping 
graph communities 

[Toj . For scale free graphs, having a degree distribu- 
tion pk ~ fc -7 , the second moment (fc 2 ) diverges for the 
important case 2 < 7 < 3 and finite numbers of cliques 
of arbitrary size emerge . 

For any graph one can define a characteristic clique 
distribution Pc{S), viz the probability for a clique of size 
S to occur. A loopless graph, exclusively has, cliques of 
size two with Pc(S) — Ss,2 and the number of 3-site 
cliques is related to the standard clustering coefficient [l], 
2]. The clustering coefficient C is a normalized measure 
for the occurrence of 3-site loops, with every 3-site loop 
being part of at least one clique of size S > 3. 

It is therefore of interest to investigate the clique distri- 
bution of real- world graphs and to consider the problem 
of constructing graphs with specific clique distributions. 



II. ALGORITHM 

We consider a given set of cliques Cx, ■ ■ ■ , Cm contain- 
ing Si = S(Ci) sites each, an instantiation of a certain 



FIG. 1: Illustration of a clique-conserving algorithm gener- 
ating a connected graph out of a given set of cliques. Start- 
ing with a 5-site clique (1,2,3,4,5) in step one, a 4-site clique 
(5,6,7,8) and a 3-site clique (8,9,10) are added in step two and 
three via a single common vertex. 



clique distribution Pc(S). We presume the clique-set to 
be monotonically ordered, 
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as illustrated in Fig. [TJ We study the task to gener- 
ate recursively a dense and connected graph out of the 
M cliques {Ci} in such a way that the final graph has 
exactly the same distribution P C (S) of fully connected 
subgraphs, viz of cliques. In Fig. [1] we illustrate the sim- 
plest procedure for solving this task, by concatenating 
the cliques C\, C2, .. via a single common vertex be- 
tween two consecutive cliques. 

Let us shortly digress and consider what would have 
happened if we had used sites 4 and 7, together with a 
new site 9 to attach the S3 = 3 clique in the third step 
for the case illustrated in Fig. [TJ In this case sites 4 
and 7 would be connected and a spurious 3-site clique, 
namely (4,5,7), would have been generated. A thought- 
less attachment of cliques in general therefore generates 
spurious additional cliques, resulting in an uncontrolled 
clique distribution for the final graph. 



2 







-1 1 












2 














3 























FIG. 2: Illustration of the dense hierarchical algorithm for 
generating dense graphs out of a given set of cliques. Starting 
with the largest clique (1,2,3,4,5), here of size Si = 5, in step 
1, cliques of size Si (i = 2,3,...) are added consecutively 
step by step by adding one additional vertex at each step and 
using Si — 1 vertices from a previously added clique. The 
second clique is here (3,4,5,6) and the third clique (4,6,7). 
Both random and preferential attachment may be used. 

A. Hierarchical algorithm 

In general one can join two cliques of sizes Si and S2 
via common vertices. The minimal number of common 
vertices is one, the maximal is 

min(S u S 2 ) - 1 . (2) 

Using more common sites, namely min(Si, S2), would 
result in the destruction of the smaller clique. We can 
then formulate a class of hierarchical algorithms conserv- 
ing a given, arbitrary but ordered, via ([1]), initial clique 
distribution: 

[1 ] At step m = 1, ...,M one adds the clique C m with 
S m = S(C m ) sites. One starts by selecting a num- 
ber S m e [1, S m — 1]. Here we will mostly concen- 
trate on the case S m = S m — 1. 

[2 ] Next one selects recursively S m mutually inter- 
connected vertices out of the graph segment con- 
structed in the previous m— 1 steps. The new clique 
is then added by mutually connecting S m — S m new 
sites among themselves and with the S m selected 
sites of the existing graph segment. 

We call the choice S m = S m — 1 the 'dense hierarchi- 
cal algorithm'; it is illustrated in Fig. [5] Here we will 
study exclusively the dense algorithm, which results in 
quite dense networks. The opposite limit, namely the 
case S m — 1 in step [1] of the hierarchical algorithm, is 
illustrated in Fig. [T] 

Starting with M cliques the dense hierarchical algo- 
rithm generates a network containing N sites in its final 



state, with 

N = Si + (M - 1) , (3) 

with Si being the size of the starting clique, which is also 
the largest. This is so, because exactly one new vertex is 
added at each of the (M — 1) steps. 

B. Random vs. preferential attachment 

The selection of the S m vertices in step [2] can be 
done either randomly, by preferential attachment or other 
rules. When considering preferential attachment we first 
select a single vertex i with an attachment probability 
n(fcj) proportional to the vertex-degree ki, 



e;(%) 

(linear preferential attachment). We then select recur- 
sively S m — 1 vertices out of the neighbors of i via pref- 
erential attachment. The set of possible vertices is given, 
at every step of this recursive selection process, by the 
set of vertices linked to all sites previously selected. Note 
that the ordering ([TJ) of the initial clique distribution is a 
precondition for the hierarchical algorithm to function. 

C. Decimation algorithm 

For further reference we shortly mention a second 
clique-conserving algorithm for network construction via 
vertex decimation. Starting with an initial network of 
M unconnected cliques C\,...,Cm one selects pairs of 
unconnected vertices either randomly or via preferential 
attachment. One then attempts a decimation by merg- 
ing the two selected vertices into a single vertex. One 
then calculates the clique distribution of the new network 
which has one less site. If the new clique distribution is 
identical to the original distribution the decimation is 
accepted, or else it is rejected. 

III. SIMULATION RESULTS 

We have studied the properties of the hierarchical 
clique-conserving graph-generation algorithm extensively 
using numerical results, evaluating their respective sta- 
tistical properties and comparing them to some selected 
real-world graph. 

A. Initial clique distribution 

The hierarchical graph generation algorithm, conserves 
per construction the initial clique distribution Pc(S). We 
have studied two cases. In Sect. IIVI we will discuss the 
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FIG. 3: The degree distributions Pk for graphs with scale- 
free clique distribution, compare Eq. and an exponent 
a = 2.6. Blue squares are for a system of M ~ 10 5 cliques, 
green stars and red diamonds show systems for 10 4 and 10 3 
cliques respectively. The data is obtained by averaging over 
1000/263/19 realizations for Pc{S) for clique-numbers M 
equal to 10 3 /10 4 /10 5 . 



results obtained by using the measured clique distribu- 
tion of real- world networks for Pc(S). Here we will con- 
centrate on some of the model clique distributions, in 
particular of scale-free form 

Pc(S) ~ (j-^j , a > 2 . (5) 

We performed simulations for various exponents a, and 
scale-free clique-distributions containing a total number 
M of cliques. For the simulations a cut-off S\ needs to be 
chosen for the scale-free distribution (JSJ), i.e. the maximal 
clique-size Si. The expected number Ns 1 (S) of cliques 
is then 

N Sl (S) = (I) ? — , (6) 

where M is the total number of cliques. We selected Si 
by the condition 

N Sl (Si) > 1, N Sl (Si + l) < 1 , (7) 

viz that there is at least one clique of size Si present 
on the average. We compared results obtained for M 
ranging typical from 10 3 — 10 5 , in order to extract scaling 
properties in the large- network limit. In order to extract 
reliable statistical properties the results were averaged 
over N rea i different random realizations. 

When selecting the value Si for the maximal clique size 
one discards all cliques with sizes S > Si. This is admis- 
sible when the percentage of discarded cliques is small. 
With the criteria ([7]) the percentage of discarded cliques 
vanishes in the thermodynamic limit M —> oo. For the 
system of order 10 4 , 10 5 , the percentage of discarded is 
well below 1%. 



FIG. 4: The degree distribution p^ for graphs having a scale- 
free clique distribution with a — 2.6 and M fa 10 4 cliques. 
Shown are results both for random and preferential attach- 
ment with lines indicating the respective slopes —2.7 (ran- 
dom) and —2.5 (preferential). 



B. System-size analysis 

In Fig. [3] we present the degree distribution pk for 
graphs with a scale-free clique distribution ([5]) and an 
exponent a = 2.6, generated through the hierarchical 
algorithm with preferential attachment. The degree dis- 
tribution results from averaging N rea i — 1000, 263, 10 
realizations for clique distributions containing M ~ 
10 3 , 10 4 , 10 5 cliques. We note that the degree distri- 
bution approaches a well defined curve for the thermo- 
dynamic limit M — ► oo. 

The degree distributions shown in Fig. [3] have bumps 
at high degrees for finite numbers of cliques M . This is 
due to the fact that the algorithm starts by incorporat- 
ing the large cliques first so that vertices with an high 
initial degree see it further increased via the preferential 
attachment during the construction process. This effect 
vanishes in the thermodynamical limit as the probability 
of a given vertex to be chosen as a part of a new clique 
decreases with system size. 

The statistical analysis of the networks presented in 
Fig. [3] are given in Table [H the number of cliques M and 



TABLE I: Statistical properties of graphs (compare Fig. [3} 
containing M ss 10 3 , 10 4 , 10 5 cliques generated by the hier- 
archical algorithm with preferential attachment, using a scale- 
free clique distribution (O, with an exponent a = 2.6. C is 
the clustering coefficient, t the average path length, (k) the 
average degree, D the network diameter, d the link density 
and TV the total number of vertices, m is the slope of the 
degree distribution pk measured for k £ [10, 40] for M « 10 3 
and k € [10, 100] for M « 10 4 , 10 s . 
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FIG. 5: The degree distribution for three scale free clique 
distributions with exponents ot\ = 2.1 (maroon plus, 11 sim- 
ulation runs), «2 = 2.6 (red diamonds, 26 simulation runs), 
Qf3 = 3.2 (yellow triangles, 15 simulation runs), for M ~ 10 5 
cliques. 

the number of vertices, N obey the relation ([3]) valid for 
the dense hierarchical algorithm. The resulting degree 
distribution, approaches within the numerical errors, a 
scale- free functional dependence with an exponent |m| 
approximately given by the exponent a = 2.6 of the con- 
served clique distribution Pc(S). 

In Fig. [4] we compare the degree distribution between 
construction rules with preferential and random attach- 
ment respectively. The difference is quite small in the re- 
gion of small to intermediate degrees fc, where the finite- 
size corrections are minor, the reason being the algorith- 
mic restriction, that only common neighbors of the al- 
ready processed vertices can be used to construct a clique 
iteratively. This restriction decreases the number of ver- 
tices available for the preferential attachment and results 
in a similar degree distribution, which is however slightly 
different from the ideal scale free line. 



C. Dependency on the scaling exponent 

We have studied the properties of the graphs gener- 
ated by the hierarchical algorithm for scale-free clique 
distributions Pc(S) and several scaling exponents a. We 
have analyzed the corresponding graphs as a function of 
clique-numbers M w 10 3 , 10 4 , 10 5 , averaging over sev- 
eral clique-distribution realizations. The resulting degree 
distributions are shown in Fig. [5] for the case M f» 10 5 , 
the corresponding statistical analysis in TablellTl In order 
to estimate the finite-size corrections we present in Ta- 
ble [HT] the corresponding results for M ~ 10 4 . We note, 
in particular, a good agreement in the estimates for the 
scaling exponent \m\ of the resulting degree distribution. 

Interestingly enough, the exponent \m\ for the degree 
distribution of the graph generated by the hierarchical al- 
gorithm with preferential attachment saturates at rs 3.1, 
close to the value 3 expected for the standard prefer- 
ential attachment algorithm When a < 3 the large 



TABLE II: Statistical properties of graphs (compare Fig. [5} 
containing M ft! 10 5 cliques generated by the hierarchical al- 
gorithm with preferential attachment, a denotes the scaling 
exponent for the clique distribution P C (S), C the clustering 
coefficient, £ the average path length, (k) the average degree, 
D is the network diameter, d is the link density and N the 
total number of vertices, m is the slope of the degree distri- 
bution pt measured for k G [10, 100]. 
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tail of the degree distribution stemming directly from the 
clique distribution dominates the resulting exponent \m\ 
for the degree distribution, but fails to do so for a > 3, 
when the preferential attachment mechanism dominates 
the generation of the fat tail. 

Next, we comment on the size of the network diameter 
D of the generated graphs. With increasing a we ob- 
serve an increasing average path length I and an increas- 
ing average diameter D while the clustering coefficient C 
decreases. The network diameter is intuitively affected 
by the number of low-degree vertices. A larger number 
of low-degree vertices for degree distributions of identical 
functional dependences, generally results in a bigger net- 
work diameter. Alternatively one may consider the num- 
ber of trivial cliques, namely those with size S = 2, viz 
edges not forming part of any larger clique. They tend to 
connect to low-degree vertices, since two connected high- 
degree vertices would have a higher probability to belong 
to cliques of size 3 or larger. 

In order to examine the influence of these trivial cliques 
on the network diameter we have eliminated, from the 
graph generated by the hierarchical algorithm with M w 
10 4 and a = 2.1, 2.6, 3.2, 4.2 all cliques of size S = 
2. The statistical properties of the resulting graph are 
given in Table IIIII The network diameter £ decreases 
substantially and the clustering C increases. We note 
that the scaling exponent m for the degree distribution 
remains unaffected, as it depends on the vertices with 
large degrees only. This result is nevertheless somewhat 
surprising, in view of dramatic reduction in the number 
of vertices N resulting from the decimation of all trivial 
cliques. 



IV. COMPARISON WITH REAL WORLD DATA 

We have evaluated the clique distributions Pc(S) for 
two real-world networks, a protein-protein interaction 
network [l4| and a WWW-graph [l2| . We then have used 
the resulting clique distributions Pq(S), as the starting 
point for the hierarchical algorithm with preferential at- 
tachment and compared the hence generated graphs with 
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TABLE III: Left table: Statistical properties of graphs with M « 10 4 cliques and various scaling exponents a for the clique 
distribution Pc(S). C is the clustering coefficient, £ the average path length, (re) the average degree, D the network diameter, d 
the link density, N the total number of vertices and m the slope measured between degree 10 and 60. The degree distributions 
result from averaging N rea i = 86, 263, 866, 306 realizations for clique distributions having a = 2.1, 2.6, 3.2, 4.2. 
Right table: The same data as for the left table, but with all cliques of degree 5 = 2 removed from the graphs. 



C £ D (re) d N m a C £ D (k) d N 



2.1 0.51 3.1 7.5 10.5 0.00104 10093 -2.0 2.1 0.94 2.73 4.1 16.7 0.0028 5885 -2.0 
2.6 0.36 3.4 8.8 5.6 0.00056 10032 -2.5 2.6 0.92 2.77 4.5 10.0 0.0021 4625 -2.5 

3.2 0.23 3.7 10.0 3.8 0.00038 10017 -2.9 3.2 0.90 2.77 4.9 7.2 0.00206 3491 -3.0 
4.2 0.10 4.1 11.8 2.7 0.00027 10007 -3.1 4.2 0.97 2.74 5.0 5.5 0.0024 2207 -3.0 



the properties of the original real- world networks. 

Fig. [5] shows the clique and the degree distributions 
of the respective original graphs, with their correspond- 
ing statistical properties given in the Table HVl We note 
that the protein-interaction graph contains cliques of up- 
to ten sites, where a typical clique-size is slightly larger 
in the WWW- net. The scaling of the degree distribution 
Pk is clearly observable for the WWW- net, but only in- 
dicative for the protein-interaction networks, due to the 
limited number of vertices it contains. 

In Table IIVI we have also included the properties of 
the graphs generated by the hierarchical algorithm using 
preferential attachment. The main difference between 
the generated networks analyzed in Table IIVI and those 
previously discussed, is the fact that they are not av- 
eraged over an ensemble of realizations of a clique dis- 
tribution. The reason is, that the exact experimental 
clique distributions for the protein-interaction network 
and for the WWW-network have been taken as an input 
for the hierarchical algorithm, which is per construction 
conserved with respect to the clique distribution. 

Next we note two caveats with respect to the protein 
interaction graph. Firstly, it is not complete, being up- 
dated continuously as new experimental results become 
available (l4| . Secondly, the protein- interaction network 
contains unconnected subsets of vertices. The largest 
component does not encompass the entire graph but 8972 
sites out of a total of 9362 vertices. We have used this 
largest component for the data analysis. 

While analyzing the data presented in Table IIVI we 
note substantial differences between the properties of the 
real-world graphs with respect to the one generated by 
the hierarchical clique-conserving algorithm. These dif- 
ferences involve essentially all key statistical quantities, 
such as the total number of vertices, the average degree, 
the network diameter and the large-k falloff of degree 
distribution. 

This leaves us with two possible conclusions, the first 
being that the clique distribution Pc(S) is probably not 
a good quantity for the purpose of characterizing a given 
graph, at least in the two examples considered here. 
The second is the possibility that an altogether different 



clique-conserving algorithm may be needed for the clique 
distribution to be used as a characterizing quantity. 
The data presented in Table IIVI was generated using 

TABLE IV: Statistical properties of a HPPI and of a WWW 
graph. C is the clustering coefficient, £ the average path 
length, (re) the average degree, D the diameter, d the link 
density, N the total number of vertices, m the slope measured 
for k £ [10, 44] for the real data (k € [10, 20] for the generated 
graph and k 6 [10, 100] for generated WWW data). 
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the hierarchical algorithm with preferential attachment, 
however, as discussed above (see Fig. [4]), the difference 
between random and preferential attachment is actually 
quite small for clique distributions having a fat tail. 



V. DISCUSSION 

In this paper we presented an algorithm, the hierarchi- 
cal algorithm, by which one can generate graphs having 
a pre-determined distribution of cliques, viz of fully con- 
nected subgraphs. We have studied, in a first step, the 
degree distribution of the resulting networks for scale- free 
clique distribution as a function of the scaling exponent. 

In a second step we used two selected real-world 
graphs, a protein-interaction network and a WWW- 
network, and examined the relation between their degree 
and clique distributions relative to those of graphs gen- 
erated via the hierarchical algorithm having the same re- 
spective clique distribution. We find no good agreement, 
and this leads us to the conclusion that either the clique 
distribution is insufficient for a in-depth characterization 
of real- world networks or that the hierarchical algorithms 
need further development. 
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FIG. 6: Left figure: Clique distribution Pc(S) of the WWW data set [12] and Human Protein Protein Interaction Database 
(HPPI) [T3]- The distributions have the exponents a www = —5.5, oihppi = —6.2. The statistical properties are given in Table 

us 

Right figure: Degree distribution p^ of the same data shown in left figure. Continuous lines show the respective slope of 
m mww — —2.8, rrihppi = —2.5. The statistical properties are given in Table ITVl 
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